fix(vision): scope the no-vision capability error to the latest user image#8180
Conversation
…image Sending an image to a model_provider without vision support (and with no vision_model_provider configured) raised a provider_capability_error AND left the [IMAGE:] marker in the long-lived session history. The capability error was triggered by a history-wide marker count, so every later turn, even plain text, re-counted the stale marker and re-failed forever. The RPC/streaming path makes this permanent: it persists the user message into the session history before the loop runs, so a failed image turn leaves its marker behind. A single image to a non-vision provider made the session unusable until restart. Scope the capability error to the most recent genuine user message: - providers/multimodal: add count_latest_user_image_markers(), the turn-scoped counterpart to count_user_image_markers (skips tool-result carriers and older user messages). - runtime/turn/vision_route: error only when the latest user message carries an image (the user just sent something we cannot see); a carried-over marker, from an earlier failed turn or a vision to non-vision model switch, degrades to text-only (markers stripped) so the turn continues. This also covers the model-switch case. The user is still told once, on the turn they send the image; subsequent text turns recover instead of re-failing. Tests: - providers: count_latest_user_image_markers_scopes_to_newest_user_message - runtime: run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider (end-to-end through the shared engine; asserts the carried-over marker is stripped and the plain-text turn succeeds)
singlerider
left a comment
There was a problem hiding this comment.
First review (no prior reviews/comments). Verified at head a9084c42 (CI green, MERGEABLE). Read the source; did not run local Cargo. All 🟢; approving. This fixes a real session-bricking bug at the right chokepoint.
🟢 The latest-vs-carried-over distinction is the correct fix
resolve_vision_provider (vision_route.rs:12) now branches on count_latest_user_image_markers (the newest genuine user message) rather than the history-wide count. The three arms are exactly right:
- User just sent an unviewable image (
latest_user_image_marker_count > 0, novision_model_provider): surfaceProviderCapabilityErrorso the attachment is not silently dropped. - Only carried-over/tool-result markers remain (
else):degrade_strip_images = true, WARN once, continue text-only. This is what stops a single failed image turn from re-failing every later plain-text turn forever, which is the reported session-until-restart bug. vision_model_providerconfigured: route to it; misconfigured non-vision route surfaces loudly.
🟢 No SSOT violation and the degrade path is non-destructive
count_latest_user_image_markers (multimodal.rs:293) reuses the same is_prompt_tool_result_message genuine-user predicate as count_user_image_markers; it is the turn-scoped counterpart, not a second source of truth. The degrade flag is threaded from resolve_vision_provider (turn/mod.rs:412) into prepare_messages_for_iteration (:439), which strips markers via strip_media_markers on a COPY of the outbound messages (vision_route.rs:138), so the long-lived session history is never mutated and the surrounding caption/metadata text survives. No filesystem path or data URI reaches the text-only provider.
🟢 Single chokepoint, attributed logging, honest disclosure
resolve_vision_provider is the one call site inside run_tool_call_loop (turn/mod.rs), so the behaviour change reaches every transport (RPC/streaming, non-streaming, channels) uniformly. The degrade WARN carries category/outcome/attrs (not a bare record!). Tests pin all three arms (count_latest_user_image_markers_scopes_to_newest_user_message, run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider, and the preserved run_tool_call_loop_returns_structured_error_for_non_vision_provider). The disclosed "1 failed" runtime test is the pre-existing cron::store test-isolation flake (process-global log broadcast race, passes in isolation, isolated by CI nextest), unrelated to this diff which touches no cron code; official CI is green.
Additive, risk:high handled correctly, scope tight (3 files). Approving.
…image (zeroclaw-labs#8180) Sending an image to a model_provider without vision support (and with no vision_model_provider configured) raised a provider_capability_error AND left the [IMAGE:] marker in the long-lived session history. The capability error was triggered by a history-wide marker count, so every later turn, even plain text, re-counted the stale marker and re-failed forever. The RPC/streaming path makes this permanent: it persists the user message into the session history before the loop runs, so a failed image turn leaves its marker behind. A single image to a non-vision provider made the session unusable until restart. Scope the capability error to the most recent genuine user message: - providers/multimodal: add count_latest_user_image_markers(), the turn-scoped counterpart to count_user_image_markers (skips tool-result carriers and older user messages). - runtime/turn/vision_route: error only when the latest user message carries an image (the user just sent something we cannot see); a carried-over marker, from an earlier failed turn or a vision to non-vision model switch, degrades to text-only (markers stripped) so the turn continues. This also covers the model-switch case. The user is still told once, on the turn they send the image; subsequent text turns recover instead of re-failing. Tests: - providers: count_latest_user_image_markers_scopes_to_newest_user_message - runtime: run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider (end-to-end through the shared engine; asserts the carried-over marker is stripped and the plain-text turn succeeds)
Summary
mastervision_model_providerconfigured) raisedprovider_capability_error capability=visionAND left the
[IMAGE:]marker in the long-lived session history. Because the error waskeyed off a history-wide marker count, every later turn (even plain text) re-counted the
stale marker and re-failed forever. The RPC/streaming path makes it permanent: it persists
the inbound user message into the session history before the turn loop runs, so a failed
image turn leaves its marker behind. A single image to a non-vision provider made the
session unusable until restart.
resolve_vision_providernow errors only when the user just sent an image we cannot see; a carried-over marker
(a prior failed image turn, or a vision -> non-vision model switch mid-session) degrades to
text-only (markers stripped, surrounding text preserved) so the conversation continues.
count_latest_user_image_markers()inzeroclaw-providers, the turn-scoped counterpartto
count_user_image_markers(), reusing the same genuine-user-message predicate.vision_model_provideris configured, no change totool-result image degradation (already degraded), no change to the channel orchestrator path
(which never persisted failed-turn images). No config / CLI / API / env surface change.
resolve_vision_provideris the single shared chokepoint insiderun_tool_call_loop(one call site,turn/mod.rs), so the behaviour change reaches everytransport (RPC/streaming, non-streaming, channels). The new function is purely additive.
bugapplied.risk:andsize:are auto-applied by the repo labeler(the path scope labels
agent/provider/runtimeare already on); the auto-risk ruleclassifies runtime-path changes as higher risk, so risk/size are left to the automation.
Validation Evidence (required)
Toolchain pinned to CI's
1.93.0.Tail output:
persists the user message to the long-lived session history before the loop). Confirmed the
single call site of
resolve_vision_provider. Reproduced the fix end-to-end through theshared engine (carried-over image -> stripped to
[media attachment], plain-text turnsucceeds). Confirmed the first-turn capability error is preserved.
cron::store::tests::remove_job_emits_structured_cron_delete_eventis a pre-existingtest-isolation flake (a UUID assert against a process-global log broadcast; a sibling
remove_jobcaller races its event in under in-process parallelism). The diff touches nocron code, the test passes in isolation, and CI's nextest (process-per-test) isolates it.
--features ci-allclippy combo (needsglib-2.0/libudevsystem libs unavailable on this host). The change touches none of the voice/desktopfeatures
ci-alladds; deferred to CI. A docs-coverage heuristic WARN fired on the newpub fn, but it is an internal cross-crate helper, not a user-facing surface, so no docs areneeded.
Security & Privacy Impact (required)
the image bytes never reach the model in either branch, and no trust boundary or policy check
is affected.
Compatibility (required)
now recovers on the next turn instead of failing permanently.
Rollback (required for
risk: mediumandrisk: high)git revert <sha>(single, self-contained commit; the newfunction is additive, so reverting cleanly restores the prior history-wide-count behaviour).
a
vision_model_provider, which is unaffected by this change.)provider_capability_errorwithcapability=vision, or the degrade WARNno vision route for carried-over/tool-result image marker(s); degrading to text-only. A regression would show the capability error re-firing onplain-text turns after an image, or an image the user just sent being silently dropped.